A Risk Analysis of File Formats for Preservation Planning
نویسندگان
چکیده
This paper presents an approach for the automatic estimation of preservation risks for file formats. The main contribution of this work is the definition of risk factors with associated severity levels and their automatic computation. Our goal is to make use of a solid knowledge base automatically aggregated from linked open data repositories as the basis for a risk analysis in the digital preservation domain. This method is meant to facilitate decision making with regard to preservation of digital content in libraries and archives. We have developed a tool for aggregating rich and trusted file format descriptions. It exploits available linked data resources and uses expert models to infer knowledge regarding the long-term preservation of digital content. The ontology mapping technique is employed for collecting the information from the web of linked data and integrating it in a common representation. Furthermore, we employ AI techniques (i.e. expert rules, clustering) for inferring explicit knowledge on the nature and preservation-friendliness of the file formats. A statistical analysis of the aggregated information and the qualitative analysis of the aggregated knowledge are presented in the evaluation part of the paper. A Web service is created to support programmatic access to format and risk analysis reports.
منابع مشابه
Connecting Preservation Planning And Plato With Digital Repository Interfaces
An accepted digital preservation workflow is emerging in which file formats are identified and those believed to be at risk are migrated to what are perceived to be less risky formats. This raises important questions about what to convert and when, if at all. In other words, how to connect file identification and migration. This area has become known as preservation planning, and seeks to take ...
متن کاملAutomatic Discovery of Preservation Alternatives Supported by Community Maintained Knowledge Bases
Preservation Planning, which deals with selecting the most appropriate preservation action to be applied to digital objects, is an important step in any digital preservation activity. Comprehensive Preservation Planning depends on the availability of identified alternatives of preservation actions, which are for example file format migrations to migrate data in an outdated format to one that ha...
متن کاملAn Automatic Bayesian Classification System for File Format Selection
This paper presents an approach for the classification of an unstructured format description for identification of file formats. The main contribution of this work is the employment of data mining techniques to support file format selection with just the unstructured text description that comprises the most important format features for a particular organisation. Subsequently, the file format i...
متن کاملAONS II: continuing the trend towards preservation software 'Nirvana'
File format obsolescence is a major risk factor threatening the sustainability of and access to digital information. While the preservation community has become increasingly interested in tools for migration and transformation of file formats, the National Library of Australia is developing mechanisms specifically focused on monitoring and assessing the risks of file format obsolescence. This p...
متن کاملSustainability Assessments at the British Library: Formats, Frameworks, & Findings
File format assessments have been the subject of much debate in and outside of the preservation community in the past decade. Recognizing the unique structural, operational, and collecting context of the British Library, the Library’s digital preservation team recently initiated new format assessment work to deliver recommendations on which file formats will best enable the preservation of inte...
متن کامل